Finding Text Boundaries and Finding Topic Boundaries: Two Different Tasks?

نویسندگان

  • Alexandre Labadié
  • Violaine Prince
چکیده

The goal of this paper is to demonstrate that usual evaluation methods for text segmentation are not adapted for every task linked to text segmentation. To do so we differentiated the task of finding text boundaries in a corpus of concatenated texts from the task of finding transitions between topics inside the same text. We worked on a corpus of twenty two French political discourses trying to find boundaries between them when they are concatenated, and to find topic boundaries inside them when they are not. We compared the results of our distance based method to the well known c99 algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An automatic Method of Finding Topic Boundaries

This article outlines a new method of locating discourse boundaries based on lexical cohesion and a graphical technique called dotplotting. The application of dotplotting to discourse segmentation can be performed either manually, by examining a graph, or automatically, using an optimization algorithm. The results of two experiments involving automatically locating boundaries between a series o...

متن کامل

Artificial General Segmentation

We argue that the ability to find meaningful chunks in sequential input is a core cognitive ability for artificial general intelligence, and that the Voting Experts algorithm, which searches for an information theoretic signature of chunks, provides a general implementation of this ability. In support of this claim, we demonstrate that VE successfully finds chunks in a wide variety of domains, ...

متن کامل

Recrystallization texture during ECAP processing of ultrafine/nano grained magnesium alloy

An ultrafine/nano grained AZ31 magnesium alloy was produced through four-pass ECAP processing. TEM microscopy indicated that recrystallized regions included nano grains of 75 nm. Pole figures showed that a fiber basal texture with two-pole peaks was developed after four passes, where a basal pole peak lies parallel to the extrusion direction (ED) and the other ~20° away from the transverse dire...

متن کامل

Reservoir Rock Characterization Using Wavelet Transform and Fractal Dimension

The aim of this study is to characterize and find the location of geological boundaries in different wells across a reservoir. Automatic detection of the geological boundaries can facilitate the matching of the stratigraphic layers in a reservoir and finally can lead to a correct reservoir rock characterization. Nowadays, the well-to-well correlation with the aim of finding the geological l...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008